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Abstract 

Robust estimation of location and concentration parameters for the von Mises- 
Fishcr distribution is discussed. A key reparamctrisation is achieved by expressing the 
two parameters as one vector on the Euclidean space. With this representation, we 
first show that maximum likelihood estimator for the von Miscs-Fishcr distribution 
is not robust in some situations. Then we propose two families of robust estimators 
which can be derived as minimisers of two density power divergences. The presented 
families enable us to estimate both location and concentration parameters simultane- 
ously. Some properties of the estimators are explored. Simple iterative algorithms are 
suggested to find the estimates numerically. A comparison with the existing robust 
estimators is given as well as discussion on difference and similarity between the two 
proposed estimators. A simulation study is made to evaluate finite sample perfor- 
mance of the estimators. We consider a sea star dataset and discuss the selection of 
the tuning parameters and outlier detection. 



1 Introduction 

Observations which take values on the p-dimensional unit sphere arise in various scien- 
tific fields. In meteorology, for example, wind directions measured at a weather station 
( Johnson and Wehrlv . 19771 ) can be considered two-dimensional spherical or, s imply, cir- 



cular data. Other examples include directions of magnetic field in a rock sample ( Stephens! . 
19791 ). which can be expressed as unit vectors on the three-dimensional sphere. 



For the analysis of spherical data, some probability distributions have been proposed 
in the literature. Among them, a model which has played a central role is the von Mises- 
Fisher distribution which is also called the Langevin distribution. It has density 



(2tt)p/2J (p _ 2)/2 (k) 



f»A x ) = , n W2r rr ex P { K ^' x ) » xeS p , 



with respect to surface area on the sphere, where \i € S p , k > 0, S p = {x € MP ; \\x\\ = 1}, 
y' is the transpose of y, and Iq(-) denotes the modified Bessel function of the first kind and 
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order q (jGradshtevn and Rvzhikl . 120071 . Equations (8.431) and (8.445)). The parameter 
\i controls the centre of rotational symmetry, or the mean direction, of the distribution, 
while the other parameter k determines the concentration o f the model. Th e distribution 
i s uni moda l and rotationally symm etric about x = fj,. See IWatsonl (|l983h . iFisher et al.l 
(|l9S7h and iMardia and JuddI (jl999h for book treatments of the model. 

Although numerous works have been done on robust estimation for models for Re- 
valued data in the literature, considerably little attention have been paid to the robust 
estimation for models for data on a bounded space. A typical example is a p-dimensional 
sphere which shows some different features from the usual linear space. Since the unit 
sphere is a compact set, the gross error sensitivity of the max i mum likelihood estimator 
is bounded. However, as pointed out, for example, in IWatsonl (jl983l ) and discussed later 
in this paper, there is strong need for the robust estimation for spherical data especially 
when observations are concentrated toward a certain direction. 

There have been some discussion on robust estimation of the parameters for the von 
Mises-Fisher distribution in the literature. Robust estimat ors of the loca tion para meter ii 
for th e circular , or two-dimensional, case were p r opose d bv lMardial (119721. p.28) and lLenth 
(|l98lh . iFisherl (|l985h . lDucharme and Milasevid (j 1987^ and lChan and He! (|1993h discussed 
the estimation of \x for the gener al dimensiona l case. The estimation of the conc entrat ion 
param eter k was considered by IFisher (119821 1. bucharme and Milasevic] (|l990h and iKol 
(|l992h . 

As described above, most of these existing works concern robust estimation of either 
location or concentration parameter for the von Mises-Fisher distribution. However, 
comparatively little work h as been don e to estimate both location and concentration 
parameters simultaneously. Lenth ( 198ll ) briefly discussed a numerical algorithm which 
estimates both p arameters for the circular case. A nonparametric approach is taken in 



Agostinelh! (|2007l ) for estimation for the circular case. To our knowledge, robust estimation 



of both parameters for the general dimensional case have never been considered before. 

In this paper we propose two families of robust estimators of both location and con- 
centration parameters for the general dimensional von Mises-Fisher distribution. To 
achieve this, we first reparametrise the parameters so that they can be expressed as one 
Revalued parameter and then derive th e e stimators as minim isers of density power diver- 
gences developed by iBasu et al.1 (|1998l ) or I.Tones et al.1 (|200lh . These approaches enable 
us to estimate both location and concentration parameters simultaneously. With this 
parametrisation, some measures of robustness of estimators, such as influence function, 
are discussed. To estimate the parameters numerically, we provide simple iterative al- 
gorithms. Some desirable properties such as consistency and asymptotic normality hold 
for the proposed estimators. Influence functions and asymptotic covariance matrices are 
available, and it is shown that they can be expressed in using only the modified Bessel 
functions of the first kind if a distribution underlying data is a mixture of the von Mises- 
Fisher distributions. 

Subsequent sections are organised as follows. In Section 1 we discuss maximum like- 
lihood estimation for the von Mises-Fisher distribution and show some problems about 
the robustness of the estimator. Also, we briefly consider what is an outlier for spherical 
data and provide the motivation for our study. In Sections 2 and 3, we propose two 
classes of robust estimators of location and concentration parameters and discuss their 
pr operties. A c omparison among the two proposed estimators and an existing estimator 
of lLenthl (|198ll ) is made in Section 4. In Section 5 a simulation study is given to compare 
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the finite sample performance of the proposed estimators. In Section 6 a sea star dataset 
is considered to illustrate how our estimators can be utilised to estimate the parameters 
and detect outliers. A prescription for choosing the tuning parameters is given. Finally, 
concluding remarks are made in Section 7. 



2 The von Mises— Fisher distribution 

2.1 Reparametrisation 

Before we embark on discussion on robust estimators, we consider parametrisation of the 
von Mises-Fisher distribution. In the most literature on this model, the parameters are 
represented as a unit vector \i and a scalar k. Each parameter has clear-cut interpreta- 
tion; The parameter fi controls the mean direction of the model, while k determines the 
concentration. 

In this paper, however, for the sake of discussion on robustness of the estimator, we 
consider the following reparametrisation: 

£ = Kfl. 

Clearly, £ takes a value in M p . It is easy to see that the Euclidean norm of £, ||£||, 
represents the concentration of the model, while the standardised vector, £/||£||, denotes 
the mean direction. Then the density of the von Mises-Fisher distribution can be written 

as 

U ||(p-2)/2 

For brevity, write X ~ vM p (£) if an S p - valued random variable X follows a distribution 
with density (HJ. With this convention, it is clearer to evaluate how an outlier influences 
the estimators of both location and concentration parameters. For example, the influence 
function, which is commonly used to discuss robustness, is more interpretable if the pa- 
rameter is expressed in this manner. See Sections 2.3, 3.3 and 4.3 for details. Throughout 
the paper we denote the density ([!]) by 



2.2 Maximum likelihood estimation 



In this subsection we discuss maximum likelihood estimation for the von Mises-Fisher 
distribution. Let X\, . . . ,X n be random samples from vM p (£). Then the maximum 
likelihood estimator of £ is known to be 



£ = 4 



-i / 1 



IE?-i*il 



(2) 



where A p (x) = I p /2{ x ) / I(p-2)/2{ x )i x £ [0, oo). See, for example. iMardia and Juppl (jl999l . 
Section 10.3.1) for the derivation of the estimator. The following hold for A p : 

(i) A p (0) = and lirn^oo A p (x) = 1, 

(ii) A p (x) is strictly increasing with respect to x. 
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See IWatsonl ()1983l . Appendix A2) for proofs. From this result, it follows that there exists 
a unique solution £ which satisfies ([2]). These properties are also attractive to solve the 
inverse function, i.e. x = A~ 1 (y), numerically. 

Maximum likelihood estimation is associated with minimum divergence estimation 
based on the Kullback-Leibler divergence. Let be density (pQ) and G the distribution 
underlying the data having density g. Then the Kullback-Leibler divergence between ft 
and g is defined as 

d K L(g,h)= [ log(g// s )dG(x). (3) 



Here and in many expressions in this paper, we omit the variable of integration. If we 
assume that G is the empirical distribution function, i.e., G = G n (Xi, . . . ,X n ), then the 
minimiser of the divergence, argmin^ GKP (i^'£ / (gf, fg), is the same as the maximum likelihood 
estimator ([2]). 

2.3 Influence function of the maximum likelihood estimator 

The influence function of the maximum likelihood estimator (J2j) for the reparametrised 
von Mises-Fisher distribution ([T]) is given in the following theorem. See Appendix for 
proof. 

Theorem 1. The influence function of the maximum likelihood estimator ^ at G is 
given by 



IF(G,x) = {MiOr 1 U - A p (U\\)vk > > W 

where 
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Note that the above influence function is different from the ones seen in lWehrlv and Shine 



and iKo and Guttorpl Their papers discuss the influence functions of the 



estimators of location and concentration parameters separately, whereas we summarise 
these two m.l.e.'s as 'one estimator of one parameter' and discuss its influence function. 

Given the influence function in Theorem 1, a natural question to address concerns the 
gross error sensitivity. Because the unit sphere is a compact set, it is clear that the gross 
error sensitivity of estimator (|2|) is bounded. Nevertheless the following results points out 
the need for the robust estimation for the model defined on this special manifold. The 
proof is straightforward from Theorem [1] and omitted. 

Theorem 2. The following properties hold for maximum likelihood estimator |]|): 

(i) For any £ € W \ {0}, it holds that 

argmax||IF(G,x)|| = -— -y and argmin \\IF(G,x)\\ = ■jt-tt . 
xgS p ||?ll xes p IKII 

(ii) Let £/||£|| be a fixed vector. Then 

lim J sup \\IF(G,x)\\ - inf \\IF(G,x)\\ \ = oo 
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and 



lim <^ sup \\IF{G,x)\\ / inf ||/F(G,a;)|| } = oo. 

||£||-s-oo xeS I xtSp 



This result implies that we need to develop a robust method to estimate £ when the 
concentration of the distribution, ||£||, is large. 

* * * Figure 1 about here * * * 

Figure 1 plots influence functions (H} of maximum likelihood estimators for some 
selected values of £. It seems that the direction of the influence function is close to that of 
x for small ||£||. The direction is strongly attracted towards — £/||£|| when ||£|| is large. The 
figure also suggests that, for small ||£||, the range of the norms of the influence functions is 
fairly narrow. The greater the value of ||£||, the wider the range of the norm. As Theorem 
2 shows, the norm of the influences functions is maximised if x = — £/||£||. Also, it can 
confirmed that the norms of the influence functions tend to infinity as ||£|| approaches 
infinity. This provides strong motivation for robust estimation of the parameter for the 
von Mises-Fisher distribution. 



2.4 Outliers in directional data 

Since a unit sphere is a compact set, unlike linear data, it is not clear what is an outlier in 
directional data. For example, if a distribution underlying data is the uniform distribution 
on the sphere, it seems difficult to identify an outlier. However, if a distribution is highly 
concentrated, then an outlier can be defined in a similar manner as in linear data. 

Here we consider an area where a sample from the von Mises-Fisher density ([1]) is not 
likely to be observed. Let a (G [0, 1]) be probability which determines the size of the area. 
The area, which we denote by Ar p , is defined by the interesection of the unit sphere S p and 
the sphere with centre at -f/llfll, namely, Ar p = [x G S p ; || < {2(1 -cos 5) j 1 / 2 ]. 

Here 5 = 5(a) (G [0, ir)) is the solution of the following equation 

f%(x)dx = a. 

Ar p 

The left-hand side of the equation can be simplified to 

fd x ) dx = ^ q fd x ) dx 

«Vll«ll<-«M* 

ll£H(P- 2 )/2 



(2vr)P/2/ (p _ 2)/2 

i-2-k 



rn 

- / exp(/c cos #i) sin p_2 6\d0i 
) Jo 

r'ZTT f'7T f'TY 

x ■■ sinP ~ 3 #2 • • • sin #p_ 2 d9 2 ■ ■ ■ dOp^x 

(Vn/ 2 )° p " 2)/2 r cos Vii*fi-^)/^ 

T{\)T{\(p-i)}i ip _ 2)/ ,m\)J-i 



(5) 

Hence, it follows that 5 can be obtained as the solution of the following integral equation 

r — cos 8 

/ e^(l-t 2 )^- 3 )/ 2 ( it = vrV 2 aV 2)/2 (||e||)r{i( P -l)} (\U\\)~ (P V ■ (6) 
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Since the integral in the left-hand side is bounded and strictly increasing with respect 5, 
it is possible to find the unique solution 5 numerically. 



* * # Figure 2 about here * * * 

Figure 2(a) demonstrates how the area Ar2 is determined from density (JTJ) and prob- 
ability a. From this frame, it can be easily understood that the area Ar2 or, equivalently, 
(—8,5) increases as a increases. Figure 2(b) plots how ||£|| influences the size of area 
([5]). As this frame clearly shows, the size of the area, which is monotonically increasing 
with respect to 5, increases as ||£|| increases. It can be confirmed that, for any p, as ||£|| 
tends to infinity, 5 approaches ir, meaning that a sample is likely to be observed only 
in a neighbour of x = £/||£||. Therefore we conclude that robust estimation for the von 
Mises-Fisher is necessary especially when the parameter ||£|| is large. This statement is 
also supported from discussion given in Figure 1 in the previous subsection. 



3 Minimum divergence estimator of iBasu et al.1 ( 119981 ) 



In this section we propose a family of estimators of the parameter for the von Mises-Fisher 
dis tribution. Our es timator can be derived as a minimiser of the divergence proposed 
by Basu et al. ( 19981 ). An iterative algorithm is presented to estimate the parameter 
numerically. The influence function and asymptotic distribution of the estimator are 
considered. 



3.1 The divergence of Ba su et al.l (1 19981 ) 



Let fg be a parametric density and g a density underlying the data. IBasu et al.l ()199 
define the density power divergence between g and fg to be 



dp(g,fo 



1 



/3(l + /3) 



1+/3 



1 



fl +P )dx, /3>0, 



1 + 



(7) 



do(g,fe) = Kmd/3(g,fe) 



g1og(g/fg)dx. 



This divergence is called t he /3-divergence as seen in iMinami and Eguchil (|2002l ) and 
Fuiisawa and Eguchi ( 20061 ) . 

The divergence between the von Mises-Fisher density and a density underlying the 
data is given in the following theorem. The proof is given in Appendix. 

Theorem 3. Let fg in (T7p be the von Mises-Fisher vM p (£) density. Then the Basu et 
al. divergence is 



dp{gJd 



i 



1 



|| ? ||( P -2)/2 



(3(1 + 13) J " (3 \ (2tt)p/2/ (p _ 2)/2 

m (p-2 W 2 / {p „ 2)/2 {(i + /3)neii} 



+ 



(2^(1+^ /^(iieii) 

If P equals 0, then d (g,f^) = d K L(g,f^), where d KL (g,f{) is as in (0) 



exp(/3£' x)g(x)dx 
P > 0. (8) 
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As a density underlying the data, consider the following mixture model 

g(x) = Q.-e)f 6 (x) + Efr,(x), (9) 

where < e < 1 and denotes the von Mises-Fisher vM p (£) density. In this case the 
divergence can be expressed as 



Ma, ft) 



{(1 - e)/ s + ef v } 1+p dx + 



I/. 



(p-2)/2 



P \ (27r)^/ (p _ 2)/2 (||e||) / 

i( P -2)/2{(i+mm wvw**- 2 ^ i<p-2),2m+v\\) 



(1+/3) ( P -2)/2 V2)/2 

I(p-2)/2{0- + P)M\\} \ 



m + r]l] ( P -2)/2 I^y^Wn] 



(l + /3)f/2/ (p „ 2)/2 



Note that the second and third terms of the divergence can be expressed by using only the 
modified Bessel functions of the first kind. In general, the first term should be evaluated 
numerically. If (3 is an integer, then the first term is of the form 



{(1 - e)fz + ef v } 1+ Pdx 



k=0 



(2ir)Pp/ 2 \\kZ + (1 + P - fc)r/||(f- 2 )/ 2 

hp~2)/2{\u+{i+P-m\} 



7 (p-2)/2 



It is remarked here that, in this case, the first term also does not involve any special 
functions other than the modified Bessel functions of the first kind. 



3.2 Estimating equation 

The estimating equation derived from the Basu et al. ( 19981 ) divergence is known to be 



(10) 



where 



f^ +/3 u^dy and = j- log = x - A p (\\£\\) jz- 







u\\ 



Following the convention in iJones et al.l (|200ll ). we call the solution of this equation 
function the type 1 estimator. From the general theory, it immediately follows that the 
estimator is consistent for £. 

Theorem 4. The function ipp(x,£) can be expressed as 



CP[\x-A p (\\t\\)^\eMPt'x) 



V2)/2{(l+/3)l|g||} 
(1 + ^)( P -2)/2 /(p _ 2)/2 



[A P {(i+p)u\\}-M\\m-^i 



where C is the normalising constant of the von Mises-Fisher vM p {^) density, i.e., C 

licn^- 2 )/ 2 /{(2^/ 2 v 2)/2 (iieii)}. 
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See Appendix for the proof. From this form, it immediately follows that Fisher con- 
sistency holds for the estimator. For simplicity, we redefine the ^-function as 

^(x,® = C-%(x,®. (11) 

Then Equation (fTUj) holds for ipp replaced by ipp. Since C does not depend on x, it is 
clear that M-estimation based upon is essentially the same as the one based on i/jp. 

Substituting G for an empirical distribution G n (X%, . . . , X n ) in Equation (|1U|) in which 
ijjp is replaced by ipp, we have 



n 

3 



^jx^-^dlelD^jexp^S 



Thus the estimator which minimises divergence (|7J) satisfies the following relationship: 

I _ A -i f II Ej ™j,/K£>j - III 1 Ej ^(0^- (12) 



Xj I 



wherewjAO = exp^'xj) and ^ = J (p „ 2)/2 {(l+/3) ||^||}[ J 4 p {(l + /3) ||^||}- J 4 p (||^||)]/{(l + 
/3)( p_2 ^ 2 /( p _2)/2(||Cll) ||£||}- Note that, since A p (x)(= y) is strictly increasing with respect 
to x, there exists a unique solution x satisfying x = A~ 1 {y). 

Then an algorithm induced from the above relationship is suggested as follows. 

Algorithm for the type 1 estimate 

Step 1. Take an initial value £o- 

Step 2. Compute £i, • • • ,£zv as follows until the estimate £jv remains virtually unchanged 
from the previous estimate £/v_i, 

. i = ^-i f II Ej w jA&) xj - nD^M \ Ej w i,/3(6) 

Step 3. Record £at as an estimate of £. 

Our simulation study implies that the above algorithm converges if an initial value is set 
properly and /3 is not too large. As an initial value, the maximum likelihood estimator 
d2]) may be one promising choice. 



The tuning parameter f3 can be estimated by using the cross-validation (jHastie et al 



20091 . Section 7.10.1). We will discuss more details of the selection of /3 in Section 7. 



3.3 Influence function 

In this subsection we consider the influence function of the type 1 estimator and compare 
this estimator with m.l.e. in terms of the influence function. 
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Theorem 5. The influence function of the type 1 estimator at G is given by 

IF(x, G) = {M p & G)}- 1 ^(x, 0, (13) 

where 



Mp(£,G) 



£x' A 



i 



u\\ p 

(1 + /3) (P " 2)/2 V 2 )/ 



-4; 



\\m 



dG{x) 



r 



x neir 1 [mo- + mm - M\\m v^^ 1 + mwv 



+ a + /3)/ P/2 {(i + mm [a p {(i + mm - M\\m 



+I( P -2)/2{(l+P)M\\} 



P-(l + P)A 2 p {(l + P)U\\} 



+ - mu\\)MO + /?)iieii} + 2^(iieii) 



u\\ 2 



and iftp(x,£) is as in /ill]) - 



(14) 



Proof. The influence function (|13|) can be obtained in a similar approach as in Theorem 
1. Some calculations to obtain Mr can be done by using Theo rem [H Equations (|22p and 
(1231) . and Equation (8.431.1) of iGradshtevn and Rvzhikl tOQl\ ). □ 



Here we consider the mixture model ([9|) as a distribution of G. Then the integral part 
of function M ( g(^, G) in the influence function is given by 



exp(/3£V 
1 - 



p xx ' - (3A P (U\\)I^ Ap 



-^A 

u\\ p 

1-e 



'leu ikii 



A Z P (U\\ 



\\m 



dG(x) 



^(llell)V2)/2{(l + /3)ll^l|} 



lien 



+ 



r ^ M + ^(iieii)j/ P/2 {(i + /3)iieii} 



/j/ p/2 {(l + /3)||g||} 

a + Alien 
V2)/2{(i+/3)lieii} 



|(p-2)/2 



IICl,/3|| ( ^ 2)/2 V2)/2(lkl 



ll^ll 2 

^p/2(IICl,/3||) - I (p-2)/2 



(IIMI)^}' 
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V 2 )/ 2 (IICw3ll)-^WIKi,/3ll)} j^p 
(i,ts\\ 

2m„n PMU\\)\ & 



+^(||e||)/ p /2(||Cl,^l 



+/( P -2)/2(||Cl^ll) (i - K(U\\) - j |j 



where £j Q = ja£ + rj. Note that, in this case, the influence functions do not involve any 
special functions other than the modified Bessel functions of the first kind and orders 
(p-2)/2 and p/2. 

* * * Figure 3 about here * * * 

Figure 3 displays influence functions (|13p of the type 1 estimator at the two-dimensional 
von Mises-Fisher model for some selected values of f3. From four frames of the fig- 
ure and Figure 1(d), it seems that the norms of influence functions are not large for 
moderately large /?. In particular, it seems that, for (3 = 0.25, ||IF(GVm> — ~~ 
||IF(GVM,£/||eil)ll and \\W(G vm ,-^/U\\)\\/\\JF(G V m^/U\\)\\ take smaller values than 
those for the maximum likelihood estimator, where Gvm is the distribution function of 
vM2{(2.37, 0)'}. This result implies that the type 1 estimator is more robust than the 
maximum likelihood estimator. 



3.4 Asymptotic normality 

The asymptotic normality of the estimator can be shown from the M-estimation the- 
ory. Let Xx, . . . ,X n be random samples from the von Mises-Fisher vM p (£) distribution. 
Suppose that £ is the type 1 estimator of £. Then 

n 1/2 (£~0 -^N(0,Vfi) as n -> oo, 

where 

V p = M^p,G)- l Q^p,G){M^p,G)'}-\ 

and il)p(x,£) and Mp(ipp,G) are defined as in (jlip and (|14p . respectively. In particular, 
if G is the distribution function of the mixture model ([9]), then Qpfyp, G) is given by 



n(, r\ - 1-e f I p/2 {(1 + 2(3)U\\} T 
Q ^ G) ~ WM)\ (l + 2/3)^||^|| 1 



+ ( (1 + 2™2)/ 2 [MM\\)I(p-2)/2{(l + 2/3)||£||} - 2/ p/2 {(l + 2/9)||£||}] 



{i+py-* v 2)/2 (iieii) L p " ^" SIIJ PVIISII/ V w 

£ I 

+ - 



| r? ||(p-2)/2 



/ (p -2)/2(l|ry||) \\\C2A {p - 2)/2 



I P /2(\\C2A) T 

\\Cw\\ 
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+ < I(p-2)/2(\\G2,p\\) - jj^-ji^dlCj^lDj n^^p 

-■^(iieiDVadiOi^ii) ^ +^(iieii)V2)/2(HC2^ii)^ 

ll^ll (p " 2)/2 V2)/ 2 {(l + /3)l|g||} _ 

iic^ii^/ 2 (i+/3)^ 2 )/ 2 v 2) / 2 (iieii) ^ 

X (/p/2(llCl,ffll) C Y/^|SH + 2^(lle|l) V2)/2(IICl^" X 



+ 



iici^mieii 

lf P ~2)/2{(l + P)U\\} 



(l + /3)P-2/2 



(P-2J/2 



[A p {(l + /3)||£||}-A p 



u\\ 2 

2_tt_ 

lien 2 ' 



Remark that the asymptotic covariance matrix can be expressed in a form of the modified 
Bessel functions of the first kind and orders (p — 2)/2 and p/2. 



4 Minimum divergence estimator of I J ones et alj ( I200ll ) 



4.1 The divergence of I Jones et a.1.1 (120011 ) 



Next we consider another divergence which is also based on density power. It is defined 

as 

d ^9, fe) = (1 * +7) log J g 1+ ^dx - I log J gf]dx + ^-j- log J f^dx, 7 > 0, (15) 

do(gJe) = iimd (g,f g ) = / g\og(g/f e )dx. 
7^0 J 



This divergence was briefly considered bv I Jones et al.l (1200 ll. Equat ion (2.9)) as a special 
case of a general family of divergences. iFujisawa and Eguchi (120081 ) investigated detailed 
properties of the divergence with emphasis on the case in which the underlying distribution 
contains heavy contamination. 

The divergence between the von Mises-Fisher density and a density underlying the 
data can be calcul ated as follows. T he procedure to derive this divergence is similar to 
that to obtain the Basu et al. ( 19981 ) divergence given in Theorem [31 and therefore the 
proof is omitted. 



Theorem 6. Let /g be the von Mises-Fisher vM p (^) density. Then divergence < T75] 
between and an arbitrary density g is given by 



d 7 (g,fe) = l°g J g 1+1 dx - ^log J exp(j£'x)g(x)dx 



1 + 7 



log 



(27r)P/ 2 V 2)/2 {||(l+ 7 )g||} 
||(l +7)e ||( P -2)/2 



(16) 



Note that the expression for this divergence is slightly simpler than that for the 
Basu et al.1 (]1998h divergence. 
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As a special case of the underlying density in the Jones et al. divergence, consider 
again the mixture of the two von Mises-Fisher densities Q . Then the divergence can be 
expressed as 



d 7 (#,/§) 



7( 



log J {(1 - e)/ 5 + ef^dx 

V2)/ 2 {(i + 7)iieii} 



-log 

7 

1 

"1+7 



log 



(1+7)(p -2)/2 /(p _ 2)/2 

(2^i iP _ 2)/2 {\\(i+ jm 

||(1 +7 )^||(p-2)/2 



+ e 



iien^- 2 )/ 2 v 2 )/ 2 (ii7g+^ii 
/ (P -2)/ 2 (iieii) ii 7 e + ??i|( p - 2 )/ 2 



From discussion in Fujisawa and Eguchil ( 2008 ). one can ignore the contamination /„ if 
the second term in the second logarithm is sufficiently small. This condition is satisfied, 
for example, when ||£|| is large and ||7£ + ~ holds. 



4.2 Estimating equation 

The estimating equation of the Jones et al. divergence is 

J f^u^gdx f / ? 1+7 11$ dx _ 



jfjgdx f 
where is defined as in Section 2.3. Or equivalently, 

tp^(x,^)dG(x) = 0, where ^j(x, £) = I u% 



(17) 



It is remarked here that this equation has been discussed by Windham ( 19941 ) although 
th e divergence which the estimating equation is based on was not considered there. As 



m 



Jones et al.1 l|200ll ). the estimator derived from this estimation equation is called the 



type estimator in the paper. 

Theorem 7. The function tp^(x,^) is given by 



ipj{x,C) = C 7 exp(7^x) 



s_^{(l + 7 )||£||}X 



where C is as in Theorem^ 



The proof is similar to that for Theorem 2] and omitted. In a similar manner as 
in Section [321 we redefine the ^-function as ?/> 7 (cc,£) = ip-y(x,£,) and consider the M- 
estimation based on tp^. 

On substituting G for an empirical distribution G n (Xi, . . . , X n ) in (|17j) . it follows that 

exp( 7 ? , x J > J / £ expfrt'xj) - A P {(1 + 7)H£||}^ = 0. 



5=1 
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Therefore it can be seen that the minimum divergence estimator satisfies the following 
equation 



( 



l 



-A: 



E 7 - w jt i(€)xj\\ 52jW jtf) (£)c 



(18) 



1 + 7 p { E^,t(6 J IIEj^^iL 

Given estimating Equation (|18|) . the following algorithm is naturally induced. 
Algorithm for the type estimate 

Step 1. Take an initial value £o- 

Step 2. Compute (i, • • ■ ,(tv as follows until the estimate £jv remains virtually unchanged 
from the previous estimate £jv_i, 

1 ._i f II Ej Wfofafa || 1 Ej w jtl (£ t )xj 
St+i = A m 



1 + 7 P \ Ej«>jVr(6) J II Ej Wj, 7 (f t )xj||" 
^tep 3. Record £at as an estimate of £. 



This a lgorithm can also be derived from an iterative algorithm of iFuiisawa and Eguchi 
( 20081 . Section 4), and from their discussion, the monotonicity of the algorithm follows: 



d 7 (g,fe ) > d 7 (g,f^) >•••> d 7 (g,f u ), 

where g denotes the empirical density. 

As for a prescription for choosing the tuning parameter 7, cross-validation is available. 
See Section 7 for details. 



4.3 Influence function 

The influence function of the type estimator (|17p is provided in the following theorem. 
The basic process to obtain the divergence is similar to that in Theorem [5] and omitted. 

Theorem 8. The influence function of the type estimator at G is given by 

IF(x; e, G) = {M 7 (£, ^(x, £), (19) 

where 



M 7 (£,G) = - j exp( 7 ^x)| 7 



xx >_ Ap{{1+l)m &L 



IKII 



(1 + 



7) [l-A p {(l + 7 )||e||}J + ) —^idG(x). 



If G is the distribution function of the mixture model with density ([9|), then the 
function M 7 (£, G) in Theorem [8] can be expressed as 



MJ^G) 



1 -e 



( l + 7 )(p-2)/2 J(p _ 2)/2 



1 I P/2 {(1 + 7)11(11} 



1 + 7 



P 



;i+7)iieii 



+ A P {(1 + 7)||(||} 



11(11 

7 /p/ 2 {(l + 7)11(11} +W(1 + 7)11(11} 
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x (l + 7 )[l-^{(l + 7)Hf||}] 



P A P {(1 + 7 )U\\} 

ikii 



ikii 5 



\v 



|(p-2)/2 



IKi, 7 l|( p " 2 )/ 2 V2)/2(ll^ll) U lid, 

+7 { V-2)/2(IICl, 7 ll) - 116,7 

(Ci, 7 



7ip/2(ll6, 7 ll) T n] , , h 

iiT — n -'(p-2)/2(JKi, 7 ll) 



11(11 



II6, 7 II 2 



- 7 ^{(1 + 7)||C||}/ P/2 (||6, 7 I 



V2)/2(II6, 7 II) (1 + 7)[1- ^{(1 + 7)11(1 



M P {(i+7)ii£iin (r 



11(11 



IKIF 



* * * Figure 4 about here * * * 

Figure 4 plots the influence functions (|13p of the type estimator for four selected 
values of 7. Note that the values of the tuning parameters in the fours frames of this figure 
correspond to those in Figure 3. Comparing these two figures, it seems that the influence 
functions of both estimators take quite similar values when both tuning parameters are 
the same. 



4.4 Asymptotic normality 

In a similar way as in Section 13.41 one can show the asymptotic normality of the estima- 
tor. Since ijjy(x,£) and M~(ip~,G) have already been given in Theorem [51 the function 
Q 7 (V> 7 , G) and the asymptotic covariance matrix can be calculated in a straightforward 
manner. In particular, if G is the distribution function of the mixture model with density 
([9]) , then Q 7 (t/? 7 , G) is given by 



QJl/jy,G) 



1-e 



(1 + 2 7 )( P -2)/2 /(p _ 2)/2( || e ||) 

P j p/2 {(i + 2 7 )iieii} 

(1 + 27)11(11 



/ P/2 {(1 + 2 7 )||(||} 



/ + 



+^{(1 + 7)||(||}V 2) / 2 {(1 + 2 7 )||(||} 



(1 + 27)11(1 
2^{(1 + 7)||(||}/ P/2 {(1 + 2 7 )||(||} 
((' 



V2)/ 2 {(1 + 27)11(11} 



|(p-2)/2 



+ 



HC2, 7 ||(p- 2 )/ 2 I(p-2)/2(\\ri\ 
P i p/2(IIC2, 7 ll)\C2 l7 C 2 , 7 



-^72(11 C2, n 



IIC20 



IIC 2>1 



+^{(1+7)||(||}V 2 )/ 2 (||C2, 



IIC2, 7 

M(l + 7)ll(IIKp/2(l|C2, 7 
((' 



IKII 2 

V2)/2(IIC2, 7 ||) 

C2, 7 (' + (C 2l7 



ll(IIIIC2, 7 | 



IKIP 



Again, the asymptotic covariance matrix does not involve any special functions other than 
the modified Bessel functions of the first kind and orders (p — 2)/2 and p/2. 
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5 Comparison among the robust estimators 



5.1 Comparison between types and 1 estimators 

In this subsection we compare the two proposed estimators of the parameters of the von 
Mises-Fisher distribution. A detailed compariso n between the two estimators for the 



general family of distributions has been given in iJones et al.1 (|200ll ) . Here we consider 
some properties of the estimators which are special for the von Mises-Fisher distribution. 

The numerical algorithms for both types of estimates presented in Sections 3.2 and 4.2 
have some similarities and differences. A similarity is that both algorithms are expressed 
in relatively simple forms and require to calculate the inverse of the function A p , i.e., the 
ratio of the modified Bessel functions. However, as discussed in Section 2.2, it seems fairly 
easy to calculate A~ l numerically since it is bounded and strictly increasing. A slight 
difference is that the algorithm for the type estimator appears to be slightly simpler 
than that for the type 1 estimator as it does not involve subtraction in the argument of 
A^ 1 in Step 2. Another special feature of the type estimator is the monotonicity of the 
algorithm as shown in Section 4.2. 

Next we discuss the influence functions and asymptotic covariance matrices of both 
estimators, which can be expressed by using the modified Bessel functions of the first 
kind and orders (p — 2)/2 and p/2. Their expression for the type estimator is simpler 
although both require the calculations of the aforementioned functions. A comparison 
between Figures 3 and 4 suggests that the behaviour of the influence functions of both 
estimators looks similar, at least, for the practical choices of the tuning parameters. When 
the tuning parameters of both estimators are large, e.g., /3 = 7 = 0.75, a simulation 
study, which will be given in the next section, implies that the influence functions of each 
estimator behave in a different manner. However, in most of the realistic cases in which 
the tuning parameters are moderately small, the performance of the estimators and their 
influence fu nctions seems quit e similar. As for the asymptotic covariance matrices, as 



discussed in IJones et al.1 (|200ll . Section 3.3), the large-sample variances matrices of both 
estimators show relatively small loss of efficiency when the tuning parameters are equal 
and small. We will provide further comparison of these estimators through a simulation 
study and an example in the later sections. 



5.2 Similarities to and differences from iLenthr s (1981) estimator 



Lenthl (Il98lh proposed a robust estimator of a location parameter of the two-dimensional 



von Mises-Fisher distribution and briefly considered an algorithm to estimate both lo- 
cation and concentration parameters. The estimator is defined as follows. Assume that 
61, . . . ,6 n are random samples from the von Mises distribution vM2(accos /U, usin [i). De- 



fine 



_ E j Wj cos Oj - _ EjWj si" gj ^ f-2 , -2 1 V 2 



l C W + S w I 



E, *--w v— \ 

Wj = i&l^B^l , ^;k) = ±{2k(1-cos^)}, 
t(6j — [i; k) 

«) = (''• M ^ c , Mt) = { ^ Sin(t/C) ' !;'- C V (20) 

w 1 csign(t), \t\ > c w [0, \t\>cn y ' 



15 



with " + " or " — " chosen according to (f> (mod 2ir) £ (or ^) [0, tt). Then the estimator 
is defined as solutions of the following estimating equations: 

n 

Wj sin(#j — /}) = and k = A^iRw)- 

i=i 

This estimator is somewhat associated with the types and 1 estimators discussed 
in the paper. All of these three estimators are related in the sense that the parameters 
are estimated by introducing some weight functions in the estimating equations. Also, all 
estimators can be obtained numerically through fairly simple algorithms. 

However our two families of estimators are different from Lenth's one. One obvious 
distinction is that our estimators discuss the general dimensional case of the von Mises- 
Fisher distribution, while the Lenth estimator, as it stands, can be used only for the 
two-dimensional case. In addition there are some other differences between the Lenth 
estimator and ours even for the two-dimensional case. As seen in Equations (1121) and 



our estimators adopt the power of the densiti es as weight fun ctions, wh e reas iLenth 



198ll ) used the weight functions ([20]) proposed by Huber ( 19641 ) or Andrews (1974). 



This distinction makes a difference in discussing Fisher consistency of the estimators. 
As shown in Sections 3.2 and 4.2 of the paper, our estimators are Fisher consistent. On the 
other hand, as shown below, Fisher consistency does not hold for the Lenth estimator. To 
prove this, we first show the following general result, which helps us evaluate theoretical 
first cosine moment for the Lenth estimator. The proof is given in Appendix. 

Lemma 1. Let f be a probability density function on the circle [— vr,-/r). Assume that w 
is a function on [—tt, tt) which satisfies the following properties: 

1. w is symmetric about 0, i.e., w(9) = w(—9) for any 6 € [— tt,tt). 

2. If cos 6 1 > cos 2 , then w(6i) > w(0 2 ). 

3. < f^w(0)f{0)d0 < oo. 
Then 

Z n w{0) cos 0f(0)d 
j::w(9)f(0)d9 
The equality holds if and only if w(9) = c. 



/7T 
cos 0f(0)d0. 
-TT 



Using Lemma 1, we immediately obtain the following result. See Appendix for the 
proof. 

Theorem 9. Assume ip is not a constant function. Then Lenth's estimator is not Fisher 
consistent. 

However, we should note that Lenth's estimator of the location /i, which is the main 
focus of the paper, is Fisher consistent if the concentration k is known, and the estimator 
can be useful in that situation. 
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6 Simulation study 



In this section a simulation study is carried out to compare the finite sample performance 
of the two proposed estimators. 

We consider the performance of the estimators in the following two cases: (i) random 
samples of some selected sizes are gathered from the von Mises-Fisher distributions (with- 
out contamination), and (ii) 100 random samples are generated from the contaminated 
von Mises-Fisher distributions with some selected contamination ratios. The case (i) is to 
discuss how much efficiency of the estimators is lost when random samples of small sizes 
do not include any outliers. The more attention will be pai d to the case (ii) as robustness 
is the main theme of the paper. We do not discuss Lenthl 's (1981) estimator here since, 



as shown in the previous section, the estimator for the concentration parameter can be 
biased due to the fact that Fisher consistency does not hold. 

First consider the case (i) in which finite samples are generated from the von Mises- 
Fisher distribution without contamination. Random samples of sizes n = 10, 20, 30, 50 
and 100 from the von Mises-Fisher distributions vM p (£) with p = 2 and £ = (2.37, 0)' and 
p = 3 and £ = (3.99, 0, 0)' are generated. For each combination of n and £, 2000 simulation 
samples are gathered. We discuss the performance of the estimators in terms of the mean 
squared error. The estimate of the mean squared error is given by X^=°i° ~~ ^|| 2 /2000, 
where the £j's (j = 1, . . . , 2000) are the estimates of £ for jth simulation sample. 

* * * Table 1 about here * * * 

Table 1 shows the estimates of the relative mean squared errors of the types and 1 
estimators for some selected values of the tuning parameters. A comparison of these two 
estimators suggests that, when the tuning parameters of these estimators are equal, the 
estimates of the relative mean squared errors generally take similar values. An exception 
is a case in which the sample size is small and the tuning parameters of the estimators 
are large. In this case the type 1 estimator generally outperforms the type estimator 
although both estimators do not seem satisfactory. Except for this special case, however, 
it might be appreciated that only a little efficiency is lost for these robust estimators. The 
table suggests that the relative mean squared error diminishes as the tuning parameter 
decreases. Also it is noted that, for large sample sizes, the relative mean squared errors 
of both estimators are almost equal to one regardless of the values of tuning parameters, 
confirming consistency of the estimators. 

Next we consider the case (ii) in order to discuss the robustness of these two families 
of estimators is discussed. Two families of distributions are chosen as contaminations, 
namely, the uniform and von Mises-Fisher distributions. The uniform distribution or 
the von Mises-Fisher distribution with small c oncen trati on H£|| is often used as a con- 
Ducharme and Milasevic ( 1987 ) and Chan and He ( 19931 ). It 



seems 



tamination as seen in 

less common to assume the von Mises-Fisher distribution with fairly large concentration 
parameter as a contamination, but this model also appears to be a reasonable choice if 
we choose its parameter such that most observations from the model lie on an area where 
samples from the von Mises-Fisher of interest are not likely to be observed. 

First consider the uniform contamination. Generate 100 random samples from a mix- 
ture of the von Mises-Fisher and uniform distributions having the form (1 — e) vM p (£) + 
e Up for some selected values of e, p and £. Then we calculate the estimates of the relative 
mean squared error in a similar way as in the previous simulation. 
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* * * Table 2 about here * * * 



Table 2 displays the estimates of the relative mean squared errors of types and 
1 estimators with respect to maximum likelihood estimator for some selected values of 
tuning parameters. It seems from the table that, when the concetration parameter or, 
equivalently, -A p (||£||) is small, the relative mean squared errors are close to one. This 
can be mathematically validated from the fact that the von Mises-Fisher distribution 
approaches the uniform distribution as ||£|| tends to 0. On the other hand, when 
is large, then the robust estimators outperform the maximum likelihood estimator. In 
particular, when ^4p(||£||) is close to 1, the relative mean squared errors of the proposed 
estimators take much smaller values than one. It is also noted that the tuning parameters 
which minimise the relative mean squared errors increase as the contamination ratio e 
increases. 

Second, we discuss the robustness of the estimators when the true distribution follows 
a mixture of two von Mises-Fisher distributions. This time, generate 100 samples from 
a mixture of the two von Mises-Fisher distributions having the form (1 — e)vM p (£) + 
evMp(C) with some selected values of e, p, £ and £. 

* * * Table 3 about here * * * 

Estimates of the relative mean squared errors of the types and 1 estimators with 
respect to maximum likelihood estimator for some selected contamination ratios and 
tuning parameters for a mixture of two von Mises-Fisher distributions are given in Table 
3. Note that the contaminating distribution is assumed to follow the von Mises-Fisher, not 
the uniform distribution, so that almost all observations lie in the area where observations 
from the distribution of interest are not likely to be observed. This table implies that the 
two estimators outperform the maximum likelihood estimator if the tuning parameters 
are chosen correctly. In particular, if e is large, both of the proposed estimators show 
much better results than the maximum likelihood estimator. It seems from the table that 
both estimators behave quite similarly, especially for small values of tuning parameters. 
Since the contaminating distribution is concentrated toward the opposite direction of £, 
the tuning parameters minimising the relative mean squared errors are greater than those 
given in Table 2 for the fixed values of e. 



7 Example 



To illstrate how our methods can be utilised to real data, we consider a dataset of di- 
rections of sea stars ( Fisher . 19931 . Example 4.20). The dataset consists of the resulant 
directions of 22 Sardinian sea stars 11 days after being displaced from their natural habi- 
tat. 



* * * Figure 5 about here * * * 

Figure 5(a) plots measurements of resultant directions of sea stars. Since the dataset 
shows symmetry and unimodality, it seems reasonable to fit the von Mises distribution 
to this dataset. However, as this frame suggests, there are two observations which can be 
considered possible outliers. Of these two samples, one at 2.57 seems to be an apparent 
outlier on the assumption that the observations follow a von-Mises distribution, while 
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the other one at 5.20 appears to be much more difficult to judge. We fit the von Mises 
distribution based on the maximised likelihood and types 1 and divergences and discuss 
how these results can be utilised for detecting outliers. To select the t uning parameters of 
the types 1 and estimators, we use the three- fold cross-validation ( Hastie et al. . 20091 . 



Section 7.10.1) implemented as follows. First, divide the dataset D into three subsets 
D\ , D2 and -D3 . Define 

1 N 

CV(/ 6 a) = jrEi^/^fea)} , (21) 

.7=1 



where f^ L (x,a) is the estimated density with a tuning parameter a based on a subset 
of the data D \ D M N is the sample size of the dataset, and r(k) is an index function 
defined as r{k) = I for x^ 6 -D/. Here we define the loss functions L for the types 1 and 
estimators as L jy, f^ L (x,a)^ = do.e(g y , f^) where do. 6 are the Basu et al. divergence 
([8]) with P = 0.6 and Jones et al. divergence (fl~6j) with 7 = 0.6, respectively, in which 
~g y is a probability function of a point distribution with singularity at Y = y. Then 
the estimate of the tuning parameter is given by a minimiser of CV(/£, a). Figure 5(b) 
and (c) exhibit the values of CV(/^,a) for the types 1 and estimators, respectively, for 
a = /i/100 (h = 1, . . . , 100). The curves of the frames show somewhat similar behaviours 
when the tuning parameters take values between and 0.45, while they look different 
if the tuning parameters are greater 0.45. These frames suggest that, for the Basu et 
al. divergence, the values of CV are more stable than the Jones et al. divergence for the 
tuning parameter greater than 0.45. 

* * * Table 4 about here * * * 

Table 4 shows the estimates of the parameters and tuning parameters for the maximum 
likelihood and types 1 and estimators. The maximum likelihood estimators are obtained 
for some subsets of the data which exclude no samples, one at 2.57 and ones at 2.57 and 
5.20. A comparison among the maximum likelihood estimates suggests that these possible 
outliers do not influence the estimate of the location parameter [i significantly. On the 
other hand, the estimate of the concentration parameter k seems to be influenced by the 
possible outliers. Both types 1 and estimates are similar to the maximum likelihood 
estimate for the dataset excluding the one sample, implying that the dataset includes only 



one outlier at 2.57 actually. This conclusion coincides with the one given bv iFisherl ([1993, 
Example 4.20) who derived the same consequence from his outlier test for discordancy 
for von Mises data. Figure 5(d) and (e) display Q-Q plots for the data excluding one 
outlier for types 1 and estimators, respectively, where quantiles of the robust estimators 
(horizontal axis) and of the empirical distribution (vertical axis) are plotted. This figure 
shows that the estimated model provides a satisfactory fit to the dataset. 

8 Discussion 



As pointed out in IWatson (| 1983h and some other references, it is known that maximum 



likelihood estimator of the parameter for the von Mises-Fisher distribution is not robust. 
In particular, as discussed in Section 2.3, a robust estimator is required especially when 
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observations are concentrated toward a certain direction. iLenthl (|l98ll ) briefly considered 
an algorithm to estimate both location and concentration parameters simultaneously. 
However, as discussed in Section 5.2, Lenth's estimator of the parameters can be used 
only for the circular case of the distribution and is not Fisher consistent. In this paper 
we provided two families of robust estimators which enable us to estimate both location 
and concentration parameters simultaneously for the general case of the von Mises-Fisher 
dis tribution. Both es timat ors can be derived as the minimisers of divergences proposed 
by Basu et al. ( 19981 ) and Jones et al. ( 2001 ). It follows from the general theory that 
some properties, including consistency and asymptotic normality, hold for the estimators. 
In addition it was shown our estimators have some special features. For example, the 
presented estimators can be obtained through fairly simple algorithms numerically. Also, 
the influence functions and asymptotic covariance matrices of both estimators can be 
expressed using the modified Bessel functions of the first kind. Some simulations sug- 
gest that the performance of both estimators is quite satisfactory and, in particular, the 
proposed estimators greatly outperform the maximum likelihood estimator if the distribu- 
tions underlying data are concentrated toward a certain direction. Possible future works 
include robust estimatio n of param eters for the extended f a milies of distributions such 
as the ones proposed by iKentl (|1982l ) and I Jones and Pewsevl (|2005l ). In particular robust 
methods for distributions with weak symmetry properties would be desired. 
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A Proof of Theorem 1 



From M-estimation theory (jHampel et all Il986l . Section 4.2c), the influence function of 
the maximum likelihood estimator ffl is of the form 



where 



d_ 



IF(G,x) = {M(Or 1 Hx,0, 



dF(x) 



After some algebra, it follows that 




I'M 



OH 



log/g 



|^iogiieii-io g i(p- 2 )/ 2 (iieii)+^ 
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wher e the third equality derives from the following formula (jAbramowitz and Stegun 



19701 . 9.6.26): 



Using this result, M(£) can be calculated as 



M(0 = -J ^(x,C) 



dF(x) 



d_ 

at 



dF(x) 



leu 
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(22) 
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W\ 



u\\" 



where the fourth eq uality holds due to the following formu la ( Mardia and Juppl ( 1999 . 
Appendix 1, (A. 14)): Ijammalamadaka and SenGuptal (|200ll . p. 289)) 



ds 



Ap(s) 



l-A 2 p (t)-^A p (t). 



(23) 



Thus we obtain Theorem 1. □ 



B Proof of Theorem [3] 

It is clear that do(g, f^) = dxL(g, /§)• We consider a case /3 > 0. 



1 r i [ IIAIKp- 2 )/ 2 I /" 

! r ii f ii(p-2)/2 ] /• 
+ TT?{ (^w(ll{ll) ) + 

Using the fact that J f(i+p)£ dx = 1, the integral in the third term of the equation can be 
expressed as 

i f i+p d i [ ii£ii (p ~ 2)/2 ) 1+/3 2W 2 / (p _ 2)/2 {(i + /3)neii} 

l + ^ l + /3l(2vr)P/ 2 / (p _ 2)/2 (||e||) J {(l + /3)||^||/2}( P -2)/2 

m (p-2 W 2 i iP _ 2)/2 {(i + mm 



(2^(1 + ^/2 



□ 
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C Proof of Theorem 3] 

It is easy to obtain the first term of ipa. We consider the second term of tpp, namely, 
— f f^ +l3 u^dy. To be more specific, it can be expressed as 



= C 1 ^|exp{(l + /?)e'y}|x-A p (||e||)^}dy 



yexp{(l + (3)£y}dy - A p (U\\)j^ ■ / e*p{(l + P)?v}dy 



(24) 



The first integration can be calculated by using the fact that, if X ~ vM p (£), then 
E(X) = A P {C)C/\\C\V ( See > for example. Mardia and JupdI (|l999l . p.169)). From this, it 
immediately follows that 



/ 



yexp{(l + y}dy =- 



The process to calculate the second integration of (|24p is essentially the same as that to 
obtain the normalising constant of the von Mises-Fisher density vM p {(l + /?)£}. Thus we 
obtain Theorem [H □ 



D Proof of Lemma 1 

For convenience, write 

r w(o) cos e m)dB r 

T w = J ~ n —— and T= cos9f(9)d9. 

Then T w can be expressed as T w = J w'(0) cos 9f(9)d9, where w'{9) = w{9) / J^ n w(u)f(u)du. 
With this convention, it holds that 



w'{9) cos9f(9)d9 

/TT 
{w'(9) -1} cos 8f{8)d9. (25) 

The second term of ([25]) can be decomposed into two terms as 

I {w'{9) -1} cos 9f(9)d9 = [ +[ . (26) 
J-w Jw'(e)>i Jw'(0)<i 

Due to Properties 1-3 of w(9), it is easy to see that there exists a constant d £ [—1, 1) such 
that {9 6 [— 7r,7r) \w'(9) > 1} = {9 € [vr,7r) | cos# > (i}. Then the following inequality 
holds for (ESD: 



+ / = / {w'(9) -1} cos 9f(9)d9+ f {w'{9) - 1} cos 9f(9)d9 

Jw'(0)<l Jcos0>d Jcos8<d 



lw'(0)>l Jw'(0)<l 
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> d 

= d 
= 0. 



f {w'(6) - l}f(9)d9 + [ {w'{8) - l}f(9)d9 

J cos 9>d Jcos9<d 



w'(9)f{8)d8 



f(9)d8 



Thus we obtain T w > T. Since f{9) > for some subset A which is not a null set, it can 
be seen that the equality holds if and only if w'{9) = 1. □ 



E Proof of Theorem 7 

We show that k is not a Fisher consistent estimator. Without loss of generality, assume 
fi = 0. Then it is easy to see that J w(9) sin 9fvM(9)d9 = 0, where Jvm is the von 
Mises vM2(k,0) density since the integrand is an odd function. Then, from Lemma 1, we 
immediately obtain 



jw(9) cos 9 f VM (9)d9 
Jw(8)f VM (9)d9 




cos 9 f VM (9)d9 



= A 2 (k). 

Therefore R w = M w > A 2 (k). □ 
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Type 1 



TypeO 



Table 1. Estimates of the relative mean squared errors of the minimum divergence estima- 
tors with respect to maximum likelihood estimator for some selected sample sizes and tun- 
ing parameters for the von Mises-Fisher distribution vM p (^) with p = 2 and £ = (2.37, 0)' 
andp = 3 and £ = (3.99, 0, 0)' . 











p = 2 










p = 3 










n = 10 


n = 20 


n = 30 


n = 50 


n = 100 


n = 10 


n = 20 


n = 30 


n = 50 


n = 100 


p = 


0.02 


0.994 


0.996 


0.996 


1.000 


0.999 


0.991 


0.996 


0.996 


0.999 


1.000 


P = 


0.05 


0.993 


0.995 


0.995 


1.003 


1.001 


0.985 


0.997 


0.996 


1.003 


1.003 


P = 


: 0.1 


1.016 


1.012 


1.008 


1.021 


1.012 


1.006 


1.019 


1.008 


1.021 


1.019 


P = 


0.25 


5.464 


1.231 


1.154 


1.151 


1.104 


1.465 


1.213 


1.132 


1.146 


1.126 


P = 


: 0.5 


14.836 


2.357 


1.711 


1.559 


1.393 


6.512 


1.969 


1.568 


1.515 


1.418 


P = 


0.75 


21.547 


4.314 


2.735 


2.232 


1.723 


19.290 


3.028 


2.159 


1.994 


1.752 


7 = 


0.02 


0.994 


0.996 


0.996 


1.000 


0.999 


0.991 


0.996 


0.996 


0.999 


1.000 


7 = 


0.05 


0.993 


0.995 


0.995 


1.003 


1.001 


0.985 


0.997 


0.996 


1.003 


1.003 


7 = 


■■ 0.1 


1.016 


1.013 


1.008 


1.021 


1.012 


1.007 


1.019 


1.008 


1.021 


1.019 


7 = 


0.25 


6.179 


1.247 


1.164 


1.158 


1.109 


1.618 


1.234 


1.143 


1.156 


1.133 


7 = 


: 0.5 


81.673 


3.501 


2.036 


1.702 


1.479 


458.822 


3.312 


1.860 


1.697 


1.530 


7 = 


0.75 


737.175 


58.871 


137.763 


4.240 


2.135 


2892.802 


820.520 


178.742 


4.206 


2.285 
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Table 2. Estimates of the relative mean squared errors of the minimum divergence es- 
timators with respect to maximum likelihood estimator for some selected contamination 
sizes and tuning parameters for a mixture of the von Mises-Fisher and the uniform dis- 
tributions (1 - e) vM p (£) + eU p with (a) p = 2, £= (0.52,0)' and p = 3, f = (0.78,0,0)', 
(b) p = 2, i = (1.16,0)' and p = 3, f = (1.80,0,0)', (ty p = 2, £ = (2.37,0)' and 
p = 3, £ = (3.99,0,0)', and (d) p = 2, £ = (10.27,0)' and p = 3, £ = (20.0,0,0)', /or 
eache = 0.02, 0.05, 0.1 and 0.2. 



(a) p = 2 and £ = (0.52, 0)' and p = 3 and £ = (0.78, 0, 0)' 



Type 1 fi = 0.02 
/8 = 0.05 

13 = 0.1 
/9 = 0.25 

13 = 0.5 
/? = 0.75 
Type 7 = 0.02 
7 = 0.05 

7 = 0.1 
7 = 0.25 

7 = 0.5 
7 = 0.75 



e = 0.02 
1.000 
1.001 
1.002 
1.009 
1.030 
1.064 
1.000 
1.001 
1.002 
1.009 
1.031 
1.071 



P -- 

e = 0.05 
1.000 
1.001 
1.002 
1.009 
1.030 
1.062 
1.000 
1.001 
1.002 
1.009 
1.031 
1.070 



z = 0.1 

1.000 
1.000 
1.001 
1.006 
1.021 
1.046 
1.000 
1.000 
1.001 
1.006 
1.022 
1.051 



s = 0.2 
1.000 
1.000 
1.001 
1.004 
1.013 
1.028 
1.000 
1.000 
1.001 
1.004 
1.013 
1.030 



e = 0.02 
1.000 
1.000 
1.001 
1.011 
1.047 
1.107 
1.000 
1.000 
1.001 
1.011 
1.049 
1.122 



P -- 

e = 0.05 
1.000 
1.001 
1.002 
1.012 
1.047 
1.106 
1.000 
1.001 
1.002 
1.012 
1.050 
1.122 



z = 0.1 
1.000 
1.000 
1.001 
1.008 
1.035 
1.081 
1.000 
1.000 
1.001 
1.008 
1.037 
1.093 



s = 0.2 
1.000 
0.999 
0.999 
1.001 
1.014 
1.040 
1.000 
0.999 
0.999 
1.001 
1.015 
1.046 
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(b) p = 2 and f = (1.16, 0)' and p = 3 and £ = (1.80, 0, 0)' 



p = 2 



P = 3 







e = 0.02 


s = 0.05 


£ = 0.1 


e = 0.2 


e = 


= 0.02 


s = 0.05 


e = 0.1 


e = 0.2 


p = 


0.02 


1.000 


1.000 


0.999 


0.998 


1 


.000 


0.999 


0.996 


0.994 


p = 


0.05 


1.001 


0.999 


0.997 


0.995 


1 


.001 


0.998 


0.991 


0.985 


p = 


= 0.1 


1.004 


1.000 


0.995 


0.990 


1 


.006 


0.998 


0.983 


0.970 


p = 


0.25 


1.024 


1.012 


0.994 


0.977 


1 


.040 


1.016 


0.973 


0.930 


p = 


= 0.5 


1.098 


1.065 


1.015 


0.960 


1 


.161 


1.100 


1.002 


0.879 


p = 


0.75 


1.213 


1.155 


1.063 


0.952 


1 


.337 


1.232 


1.079 


0.855 


7 = 


0.02 


1.000 


1.000 


0.999 


0.998 


1 


.000 


0.999 


0.996 


0.994 


7 = 


0.05 


1.001 


0.999 


0.997 


0.995 


1 


.001 


0.998 


0.991 


0.985 


7 = 


: 0.1 


1.004 


1.000 


0.995 


0.990 


1 


.006 


0.998 


0.983 


0.970 


7 = 


0.25 


1.025 


1.013 


0.994 


0.976 


1 


.042 


1.017 


0.974 


0.929 


7 = 


= 0.5 


1.111 


1.074 


1.020 


0.959 


1 


.190 


1.120 


1.012 


0.874 


7 = 


0.75 


1.296 


1.216 


1.103 


0.960 


1 


.515 


1.355 


1.173 


0.854 



28 



(c) p = 



2 and £ 



= (2.37,0)' andp 



= 3 and £ = (3.99,0,0)' 



= 2 



= 3 







e = 0.02 


s = 0.05 


£ = 0.1 


e = 0.2 


e = 0.02 


s = 0.05 


e = 0.1 


e = 0.2 


/3 = 


0.02 


0.999 


0.991 


0.986 


0.989 


0.990 


0.967 


0.962 


0.974 


P = 


0.05 


1.001 


0.980 


0.964 


0.972 


0.982 


0.922 


0.907 


0.935 


P = 


= 0.1 


1.011 


0.965 


0.930 


0.943 


0.980 


0.862 


0.822 


0.869 


P = 


0.25 


1.097 


0.961 


0.840 


0.857 


1.058 


0.777 


0.628 


0.682 


P = 


= 0.5 


1.386 


1.084 


0.761 


0.726 


1.325 


0.838 


0.508 


0.469 


P = 


0.75 


1.736 


1.289 


0.760 


0.640 


1.616 


0.968 


0.511 


0.391 


7 = 


0.02 


0.999 


0.991 


0.986 


0.989 


0.990 


0.967 


0.962 


0.974 


7 


0.05 


1.001 


0.980 


0.964 


0.972 


0.982 


0.922 


0.907 


0.935 


" = 


: 0.1 


1.011 


0.965 


0.929 


0.943 


0.980 


0.861 


0.821 


0.869 


7 = 


0.25 


1.102 


0.962 


0.837 


0.854 


1.065 


0.776 


0.619 


0.673 


7 = 


= 0.5 


1.480 


1.141 


0.760 


0.705 


1.446 


0.900 


0.502 


0.421 


7 = 


0.75 


2.245 


1.660 


0.851 


0.619 


2.158 


1.308 


0.616 


0.342 
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(d) p = 2 and f = (10.27, 0)' and p = 3 and f = (20.00, 0, 0)' 



= 2 



p = 3 







e = 0.02 


s = 0.05 


e = 0.1 


e = 0.2 


e = 0.02 


s = 0.05 


£ = 0.1 


e = 0.2 


/3 = 


0.02 


0.841 


0.891 


0.945 


0.983 


0.691 


0.792 


0.894 


0.967 


/3 = 


0.05 


0.635 


0.723 


0.851 


0.953 


0.372 


0.486 


0.700 


0.903 


/3 = 


= 0.1 


0.412 


0.461 


0.665 


0.891 


0.194 


0.172 


0.332 


0.733 


P = 


0.25 


0.297 


0.157 


0.198 


0.535 


0.173 


0.071 


0.054 


0.114 


P = 


= 0.5 


0.365 


0.153 


0.109 


0.164 


0.220 


0.081 


0.049 


0.065 


P = 


0.75 


0.452 


0.180 


0.115 


0.144 


0.271 


0.095 


0.057 


0.075 


7 = 


0.02 


0.841 


0.891 


0.945 


0.983 


0.691 


0.792 


0.894 


0.967 


7 = 


0.05 


0.634 


0.723 


0.851 


0.953 


0.371 


0.485 


0.699 


0.903 


7 = 


: 0.1 


0.411 


0.459 


0.663 


0.890 


0.193 


0.168 


0.326 


0.730 


7 = 


0.25 


0.300 


0.154 


0.186 


0.508 


0.177 


0.070 


0.048 


0.081 


7 = 


= 0.5 


0.400 


0.169 


0.109 


0.114 


0.254 


0.094 


0.053 


0.042 


7 = 


0.75 


0.593 


0.245 


0.145 


0.111 


0.407 


0.146 


0.085 


0.065 
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Table 3. Estimates of the relative mean squared errors of the minimum divergence esti- 
mators with respect to maximum likelihood estimator for some selected sample sizes and 
tuning parameters for a mixture of the von Mises-Fisher distributions (1 — e) vM p (£) + 
e vM p (C) with (i) p = 2, f = (2.37, 0)' and ( = (-100, 0)' and (ii) p = 3, £ = (3.99, 0, 0)' 
and C = (-199, 0, 0)' for each e = 0.05, 0.1, 0.2, and 0.3. 

p = 2 p = 3 

e = 0.02 e = 0.05 £ = 0.1 £ = 0.2 £ = 0.02 £ = 0.05 £ = 0.1 £ = 0.2 

Type 1 p = 0.02 0.982 0.969 0.975 0.988 0.939 0.923 0.945 0.975 

/3 = 0.05 0.958 0.924 0.937 0.969 0.861 0.813 0.860 0.935 

13 = 0.1 0.925 0.852 0.874 0.936 0.767 0.652 0.720 0.863 

/3 = 0.25 0.894 0.674 0.687 0.829 0.697 0.374 0.362 0.601 

(3 = 0.5 1.041 0.553 0.448 0.629 0.835 0.328 0.175 0.214 

= 0.75 1.273 0.565 0.352 0.451 0.997 0.367 0.168 0.136 

Type 7 = 0.02 0.982 0.969 0.975 0.988 0.939 0.923 0.945 0.975 

7 = 0.05 0.957 0.924 0.937 0.969 0.861 0.813 0.860 0.935 

7 = 0.1 0.925 0.851 0.874 0.936 0.766 0.651 0.719 0.863 

7 = 0.25 0.895 0.669 0.681 0.826 0.700 0.365 0.345 0.588 

7 = 0.5 1.123 0.563 0.422 0.600 0.931 0.366 0.173 0.145 

7 = 0.75 1.760 0.738 0.412 0.389 1.379 0.533 0.243 0.136 



31 



Table 4. Estimates of the parameters and tuning parameters for the maximum likelihood 
estimators and two minimum divergence estimators. The maximum likelihood estimators 
are obtained for some subsets of the data excluding no samples, one at 2.57 and ones 
at 2.57 and 5.20. The parameters ft and k are defined by fi = Arg(cos!;i + isin^) an d 
k = (£f +£i) 1//2 > respectively, where £ = (£1,^2)'- 

MLE MLE (with one MLE (with two Type 1 Type 
sample excluded) samples excluded) 

tuning parameter — 0.59 0.48 

/t 0.0541 0.0232 0.0712 0.0377 0.0380 

k 3.30 5.74 7.66 5.86 5.98 
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(a) (b) 




(c) (d) 

Figure 1: Influence functions Q of maximum likelihood estimators for (a) £ = (0.10, 0)', 
(b) £ = (0.52,0)', (c) f = (1.16,0)' and (d) £ = (2.37,0)' for the vM 2 (£) model. For 
convenience, the norms of the influence functions are divided by four. In each frame, the 
white dot denotes the origin, while the black one denotes ^(H^H) £/||£||- 
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(a) 




2 4 6 8 10 



(b) 

Figure 2: (a) Plot of the von Mises-Fisher density vM2{(5,0)'} (solid), the unit circle 
(dashed), the disc N (dotted) and the area Ar<i (bold and solid) with a = 0.05 and (b) plot 
of 5 satisfying Equation ([6]) with a = 0.05 and p = 2 (solid), 3 (dashed), 4 (dot-dashed) 
and 5 (dotted) as a function of ||£||. 
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(c) (d) 

Figure 3: Influence functions (|13j) of the type 1 estimator for the vM2{(2.37, 0)'} model 
with (a) P = 0.05, (b) (3 = 0.1, (c) /3 = 0.25 and (d) /3 = 0.5. For convenience, the norms 
of the influence functions are divided by four. The white dot denotes the origin and the 
dotted line represents the vM2{(2.37, 0)'} density. 
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(a) (b) 




-4-3-2-1012 -4-3-2-1012 
x[1] x[1] 



(c) (d) 

Figure 4: Influence functions (|19j) of the type estimator for the vM2{(2.37, 0)'} model 
with (a) 7 = 0.05, (b) 7 = 0.1, (c) 7 = 0.25 and (d) 7 = 0.5. For convenience, the norms 
of the influence functions are divided by four. The white dot denotes the origin and the 
dotted line represents the vM2{(2.37, 0)'} density. 
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(a) 




-3-2-10123 -3-2-10123 
Quanliles of von Mises Quanliles of von Mises 



(d) (e) 

Figure 5: (a) Plot of measurements of resultant directions of 22 sea stars after 11 days 
of movement, plots of values of CV (|2ip for 100 selected values of tuning parameters 
between and 1 for (b) type 1 and (c) type estimators, and Q-Q plots for the data 
excluding one outlier for (d) type 1 estimator and (e) type estimator where quantiles of 
the estimators (x-axis) and of the empirical distribution (y-axis) are plotted. 
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